NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Tracing the Evolution of Information Transparency for OpenAI’s GPT Models through a Biographical Approach

https://doi.org/10.1609/aies.v7i1.31757

Xu, Zhihan; Mustafaraj, Eni (October 2024, Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society)
Das, Sanmay; Green, Brian Patrick; Varshney, Kush; Ganapini, Marianna; Renda, Andrea (Ed.)
Information transparency, the open disclosure of information about models, is crucial for proactively evaluating the potential societal harm of large language models (LLMs) and developing effective risk mitigation measures. Adapting the biographies of artifacts and practices (BOAP) method from science and technology studies, this study analyzes the evolution of information transparency within OpenAI’s Generative Pre-trained Transformers (GPT) model reports and usage policies from its inception in 2018 to GPT-4, one of today’s most capable LLMs. To assess the breadth and depth of transparency practices, we develop a 9-dimensional, 3-level analytical framework to evaluate the comprehensiveness and accessibility of information disclosed to various stakeholders. Findings suggest that while model limitations and downstream usages are increasingly clarified, model development processes have become more opaque. Transparency remains minimal in certain aspects, such as model explainability and real-world evidence of LLM impacts, and the discussions on safety measures such as technical interventions and regulation pipelines lack in-depth details. The findings emphasize the need for enhanced transparency to foster accountability and ensure responsible technological innovations.
more » « less
Full Text Available
YouTube and Conspiracy Theories: A Longitudinal Audit of Information Panels

https://doi.org/10.1145/3648188.3675128

Godinez, Lillie; Mustafaraj, Eni (September 2024, ACM)

Full Text Available
Algorithmic Misjudgement in Google Search Results: Evidence from Auditing the US Online Electoral Information Environment

https://doi.org/10.1145/3630106.3658916

Perreault, Brooke; Lee, Johanna Hoonsun; Shava, Ropafadzo; Mustafaraj, Eni (June 2024, ACM)

Google Search is an important way that people seek information about politics [8], and Google states that it is “committed to providing timely and authoritative information on Google Search to help voters understand, navigate, and participate in democratic processes.”1 This paper studies the extent to which government-maintained web domains are represented in the online electoral information environment, as captured through 3.45 Google Search result pages collected during the 2022 US midterm elections for 786 locations across the United States. Focusing on state, county, and local government domains that provide locality-specific information, we study not only the extent to which these sources appear in organic search results, but also the extent to which these sources are correctly targeted to their respective constituents. We label misalignment between the geographic area that non-federal domains serve and the locations for which they appear in search results as algorithmic mistargeting, a subtype of algorithmic misjudgement in which the search algorithm targets locality-specific information to users in different (incorrect) locations. In the context of the 2022 US midterm elections, we find that 71% of all occurrences of state, county, and local government sources were mistargeted, with some domains appearing disproportionately often among organic results despite providing locality-specific information that may not be relevant to all voters. However, we also find that mistargeting often occurs in low ranks. We conclude by considering the potential consequences of extensive mistargeting of non-federal government sources and argue that ensuring the correct targeting of these sources to their respective constituents is a critical part of Google’s role in facilitating access to authoritative and locally-relevant electoral information.
more » « less
Full Text Available
People, not search-engine algorithms, choose unreliable or partisan news

https://doi.org/10.1038/d41586-023-01634-5

Mustafaraj, Eni (June 2023, Nature)

Analysis of people’s web searches and visited websites suggests that it is more likely that they are choosing to engage with partisan or unreliable news than that they are being unduly exposed to it by search-engine algorithms.
more » « less
Full Text Available
Identifying the Gaps in the Coverage of Web Domains in Wikipedia and Wikidata for Credibility Assessment Purposes

Lu, Malinda; Mustafaraj, Eni (May 2023, Wiki Workshop (10th edition))

In February 2021, Google Search added a new interface feature to support the evaluation of web domains, known as the “About this result” feature. A prominent part of this feature is a snippet of text pulled automatically from Wikipedia, if a Wiki page for the web domain exists. While conducting large-scale audits of Google Search, we discovered that less than 40% of web domains shown in Google Search results contain a Wikipedia page. Then, we retrieved their Wikidata entries and looked at the extent they incorporate features related to W3C credibility signals. The lack of information for many signals points out to avenues for expanding Wikidata coverage.
more » « less
Full Text Available
Disrupt, Ally, Resist, Embrace (DARE): Action Items for Computational Social Scientists in a Changing World

Jaidka, Kokil; Mustafaraj, Eni; Schoch, David; Joseph, Kenneth (July 2023, ICWSM Workshop Proceedings)

In the past decade, a number of sophisticated AI-powered systems and tools have been developed and released to the scientific community and the public. These technical developments have occurred against a backdrop of political and social upheaval that is both magnifying and magnified by public health and macroeconomic crises. These technical and socio-political changes offer multiple lenses to contextualize (or distort) scientific reflexivity. Further, to computational social scientists who study computer-mediated human behavior, they have implications on what we study and how we study it. How should the ICWSM community engage with this changing world? Which disruptions should we embrace, and which ones should we resist? Whom do we ally with, and for what purpose? In this workshop co-located with ICWSM, we invited experience-based perspectives on these questions with the intent of drafting a collective research agenda for the computational social science community. We did so via the facilitation of collaborative position papers and the discussion of imminent challenges we face in the context of, for example, proprietary large language models, an increasingly unwieldy peer review process, and growing issues in data collection and access. This document presents a summary of the contributions and discussions in the workshop.
more » « less
Full Text Available
Capturing the Aftermath of the Dobbs v. Jackson Women’s Health Organization Decision in Google Search Results across the U.S.

https://doi.org/10.1609/icwsm.v17i1.22214

Perreault, Brooke; Dau, Lan; Wintner, Anya; Mustafaraj, Eni (June 2023, Proceedings of the International AAAI Conference on Web and Social Media)

How do Google Search results change following an impactful real-world event, such as the U.S. Supreme Court decision on June 24, 2022 to overturn Roe v. Wade? And what do they tell us about the nature of event-driven content, generated by various participants in the online information environment? In this paper, we present a dataset of more than 1.74 million Google Search results pages collected between June 24 and July 17, 2022, intended to capture what Google Search surfaced in response to queries about this event of national importance. These search pages were collected for 65 locations in 13 U.S. states, a mix of red, blue, and purple states, with respect to their voting patterns. We describe the process of building a set of circa 1,700 phrases used for searching Google, how we gathered the search results for each location, and how these results were parsed to extract information about the most frequently encountered web domains. We believe that this dataset, which comprises raw data (search results as HTML files) and processed data (extracted links organized as CSV files) can be used to answer research questions that are of interest to computational social scientists as well as communication and media studies scholars.
more » « less
Full Text Available
Assessing Google Search’s New Features in Supporting Credibility Judgments of Unknown Websites

https://doi.org/10.1145/3576840.3578277

Wang, Ace; De Jesus Sanchez, Liz Maylin; Wintner, Anya; Zhu, Yuanxin; Mustafaraj, Eni (January 2023, Proceedings of the 2023 Conference on Human Information Interaction and Retrieval)

This study assesses the awareness and perceived utility of two features Google Search introduced in February 2021: “About this result” and “More about this page”. Google stated that the goal of these features is to help users vet unfamiliar web domains (or sources). We investigated whether the features were sufficiently prominent to be detected by frequent users of Google Search, and their perceived utility for making credibility judgments of sources, in one-on-one user studies with 25 undergraduate college students, who identify as frequent users of Google Search. Our results indicate a lack of adoption or awareness of these features by our participants and neutral-positive perceptions of their utility in evaluating web sources. We also examined the perceived usefulness of nine other domain credibility signals collected from the W3C.
more » « less
Full Text Available
‘Highly Partisan’ and ‘Blatantly Wrong’: Analyzing News Publishers’ Critiques of Google’s Reviewed Claims

Lurie, Emma; Mustafaraj, Eni (October 2020, Proceedings of the 2020 Truth and Trust Online Conference (TTO 2020))
De Cristofaro, Emiliano; Nakov, Preslav (Ed.)
Google’s reviewed claims feature was an early attempt to incorporate additional credibility signals from fact-checking onto the search results page. The feature, which appeared when users searched for the name of a subset of news publishers, was criticized by dozens of publishers for its errors and alleged anticonservative bias. By conducting an audit of news publisher search results and focusing on the critiques of publishers, we find that there is a lack of consensus between fact-checking ecosystem stakeholders that may be important to address in future iterations of public facing fact-checking tools. In particular, we find that a lack of transparency coupled with a lack of consensus on what makes a fact-check relevant to a news article led to the breakdown of reviewed claims.
more » « less
Full Text Available
The Media Coverage of the 2020 US Presidential Election Candidates through the Lens of Google’s Top Stories

Kawakami, Anna; Umarova, Khonzoda; Mustafaraj, Eni (June 2020, The 14th International AAAI Conference on Web and Social Media (ICWSM 2020))

Choosing the political party nominees, who will appear on the ballot for the US presidency, is a long process that starts two years before the general election. The news media plays a particular role in this process by continuously covering the state of the race. How can this news coverage be characterized? Given that there are thousands of news organizations, but each of us is exposed to only a few of them, we might be missing most of it. Online news aggregators, which aggregate news stories from a multitude of news sources and perspectives, could provide an important lens for the analysis. One such aggregator is Google’s Top stories, a recent addition to Google’s search result page. For the duration of 2019, we have collected the news headlines that Google Top stories has displayed for 30 candidates of both US political parties. Our dataset contains 79,903 news story URLs published by 2,168 unique news sources. Our analysis indicates that despite this large number of news sources, there is a very skewed distribution of where the Top stories are originating, with a very small number of sources contributing the majority of stories. We are sharing our dataset1 so that other researchers can answer questions related to algorithmic curation of news as well as media agenda setting in the context of political elections.
more » « less
Full Text Available

« Prev Next »

Search for: All records